On the Reduction of Biases in Big Data Sets for the Detection of Irregular Power Usage
نویسندگان
چکیده
In machine learning, a bias occurs whenever training sets are not representative for the test data, which results in unreliable models. The most common biases in data are arguably class imbalance and covariate shift. In this work, we aim to shed light on this topic in order to increase the overall attention to this issue in the field of machine learning. We propose a scalable novel framework for reducing multiple biases in high-dimensional data sets in order to train more reliable predictors. We apply our methodology to the detection of irregular power usage from real, noisy industrial data. In emerging markets, irregular power usage, and electricity theft in particular, may range up to 40% of the total electricity distributed. Biased data sets are of particular issue in this domain. We show that reducing these biases increases the accuracy of the trained predictors. Our models have the potential to generate significant economic value in a real world application, as they are being deployed in a commercial software for the detection of irregular power usage.
منابع مشابه
Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies
Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...
متن کامل3D Detection of Power-Transmission Lines in Point Clouds Using Random Forest Method
Inspection of power transmission lines using classic experts based methods suffers from disadvantages such as highel level of time and money consumption. Advent of UAVs and their application in aerial data gathering help to decrease the time and cost promenantly. The purpose of this research is to present an efficient automated method for inspection of power transmission lines based on point c...
متن کاملSubmitting an automation method for detection cavitations in hydro turbines considering sensitivity parameters (Sefidroud hydroelectric power plant dam)
In this research, submitting a method for evaluation of detection cavitation specifications and also automation of cavitation threshold has been investigated. The case study was based on Kaplan hydro turbine data located on Tarik hydropower plant at Sefidroud dam. The foundation of method was employment MATLAB program, sensor classification sensor locations and cavitation sensitivity. For train...
متن کاملBehavior-Based Online Anomaly Detection for a Nationwide Short Message Service
As fraudsters understand the time window and act fast, real-time fraud management systems becomes necessary in Telecommunication Industry. In this work, by analyzing traces collected from a nationwide cellular network over a period of a month, an online behavior-based anomaly detection system is provided. Over time, users' interactions with the network provides a vast amount of usage data. Thes...
متن کاملSurvey on Perception of People Regarding Utilization of Computer Science & Information Technology in Manipulation of Big Data, Disease Detection & Drug Discovery
this research explores the manipulation of biomedical big data and diseases detection using automated computing mechanisms. As efficient and cost effective way to discover disease and drug is important for a society so computer aided automated system is a must. This paper aims to understand the importance of computer aided automated system among the people. The analysis result from collected da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1801.05627 شماره
صفحات -
تاریخ انتشار 2018